Yangon University of Economics 
Post Graduate Diploma in Research Studies (PGDRS) Progamme 
First Quarter Examination, September 2023 


PGDRS-112 
PGDRS (9" Batch) Exploratory Data Analysis 
Answer Any Five Questions. Time Allowed: 3 hours 


1. (a) What are the measures of location? 
(b) The following table summarizes the distances, to the nearest km, that 134 examiners 


travelled to attend a meeting in Yangon. 


Distance 41—45 46-50 51—60 61-70 71—90 91-150 
(km) 

Number of 4 19 53 37 15 6 
examiners 


(i)Find the measures of central tendency for above grouped frequency distribution. 


(ii)Comment on your shape of distribution. 


2. (a) What are the difference between Harmonic mean and Geometric mean? 
(b) A carpenter buys $1500 worth of nails at $35 per pound, $1500 worth of nails at $25 
per pound and $1500 worth of nails at $15 per pound. Find the average cost of 1 pound of 
nails. 


(c) Given below are the prices in ratios for five commodities with the corresponding 


weights. 
Commodity Price Ratio Weight 
1 2.20 30 
2 1.85 25 
3 1.80 22 
4 2.05 13 
5 1.75 10 
Calculate: 
(i) Weighted mean and 


(ii) Geometric mean. 


3. (a) Define measures of variation. 


(b)The scores of two golfers are shown. 


Golfer A 83 88 84 95 91 89 90 87 98 95 


Golfer B 89 87 93 95 92 94 88 91 89 92 


(i)Find the range of the scores for each golfer. 
(ii)Which data set is more uniform? 


Gii)Comment on your results. 


4. A sample of concrete specimens of a certain type is selected, and the compressive 
strength of each specimen is determined. The mean and standard deviation are calculated 
as 3000 and 500, and the sample histogram is found to be well approximated by a normal 
curve. 

(a)Approximately what percentage of the sample observations are between 2500 and 
3500? 

(b) Approximately what percentage of sample observations is outside the interval from 
2000 to 4000? 

(c) What can be said about the approximate percentage of observations between 2000 and 
2500? 

(d)Why would you not use Chebyshev’s Rule to answer the questions posed in Parts (a) 


to (c)? 


5. (a)The average 30 - to 39 - year-old man is 69.6 inches tall, with a standard deviation of 
3.8 inches, while the average 30- to 39-year-old woman is 64.1 inches tall, with a 
standard deviation of 3.0 inches. Who is relatively taller, a 73-inch man or a 68-inch 
woman? 

(b) A highly selective boarding school will only admit students who place at least 1.5 
standard deviations above the mean on a standardized test that has a mean of 200 and a 
standard deviation of 26. What is the minimum score that an applicant must make on the 


test to be accepted? 


6. The following data represent the hemoglobin (in g/dL) for 20 randomly selected cats. 


57 89 96 106 117 11 94 99 107 129 


78 95 100 110 130 87 96 103 112 134 


(a) Compute the z-score corresponding to the hemoglobin of Blackie, 78 g/dL. Interpret 
this result. 

(b) Determine the all quartiles. 

(c) Compute and interpret the interquartile range, IQR. 


(d) Determine the lower and upper fences. Are there any outliers? 


. (a) Write down the five numbers summary? 


(b)The principal surveyed 30 anonymous students to determine how many minutes a day 


the students spend exercising. The results from the 30 anonymous students are shown. 


10 40 60 30 60 10 45 30 300 90 


30 120 60 10 20 10 40 60 30 60 


10 45 30 350 90 30 120 60 10 20 


(i) What are the minimum and maximum values of the above data set? 
Gii) Determine the quartiles and IQR. 


Gii) Construct a boxplot that shows outliers. 


. The following data set shows the weights in pounds for the boys and girls in a class of 40 


students: 


Boy | 166 166 167 167 168 168 168 168 168 169 


169 169 170 171 172 172 172 173 173 174 


Girl | 161 161 162 162 163 163 163 165 165 165 


166 166 166 167 168 168 168 169 169 169 


(a) Fine the smallest and largest values, the median, and first and third quartile for the 
boys. 

(b) Find the smallest and largest values, the median, and first and third quartile for the 
girls. 

(c) Create a box for each set of data. Use one number line for both box plots. 


(d) Which box plot has widest spread? 


